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Problem  Definition 


A.  Background 


Passive  surveillance  system  performance  in  terms  of 
area  coverage  is  ultimately  limited  by  the  noise  environ¬ 
ment  in  which  it  must  operate.  As  the  state  of  the  art  in 
design  has  progressed,  interest  is  no  longer  exclusively 
in  the  omni-level  of  the  noise  field  but  now  includes  as 
equally  important  the  spatial,  temporal  and  directional 
characteristics  of  this  field.  Additionally,  extreme  concern 
with  the  low  frequency  portion  of  the  noise  spectrum  develops 
as  the  need  for  enhanced  classification  capability  increases. 
The  predominant  component  of  the  low  frequency  noise  field 
for  most  platforms  is  that  of  ship  noise.  Thus,  as  the 
need  for  increased  passive  detection  and  classification 
capability  increases,  one  is  forced  to  consider  the  low 
frequency  portion  of  the  spectrum.  In  turn,  this  implies 
the  need  to  quantitatively  specify  the  ambient  noise  envir¬ 
onment. 

Shipping  distributions  play  a  dominant  role  in  the 
calculation  of  low  frequency  ambient  noise,  since  these 
distributions  affect  both  the  level  and  the  directional 
characteristics  of  this  noise  field.  The  geographical 
distribution  of  ships  throughout  the  ocean  basin  gives 

rise  to  the  directionality,  both  horizontal  and  vertical.  _ 

The  variation  of  transmission  loss  with  range  and  bearing 
serves  to  accentuate  or  moderate  this  directional  charac¬ 
ter,  and  also  determines  the  level  of  the  shipping  contri¬ 
bution  to  ambient  noise  field  at  the  receiver.  Both  the 
ambient  noise  and  the  transmission  loss  are  a  function 
of  the  depth  of  the  receiver  as  substantiated  by  the 
comparison  of  simulation  models  and  experimental  results. 


It  is  clear  that  all  ships  throughout  an  ocean  basin 
contribute  to  the  ambient  noise  field.  These  individual 
contributions  will  be  more  or  less  significant  to  the  level 
and  directionality  of  the  ambient  noise  field  depending 
upon  the  transmission  loss  from  the  ship  to  the  receiver. 
Near  ships,  i.e.,  those  within  50  miles,  may  not  be  more 
important  than  far  ships,  i.e.,  200-500  miles,  depending 
on  properties  of  the  various  convergence  zones.  Calcula¬ 
tions  clearly  show  that  both  the  level  and  directional 
characteristics  of  ambient  noise  are  sensitive  to  the  total 
shipping  distribution,  including  both  near  and  far  ships. 

Historical  shipping  fields  are  available  and  serve 
as  a  basis  for  planning  and  preliminary  calculations  in 
the  determination  of  real  time  shipping  distributions. 
Experience  backed  by  empirical  results  demonstrates  that 
for  model  validation  and  ambient  noise  determination  for 
particular  sites  in  a  particular  experiment  an  accurate 
and  timely  observation  of  the  true  total  shipping  distri¬ 
bution  must  be  employed  to  obtain  satisfactory  results. 

In  the  near  term  this  distribution  is  obtainable  only  by 
the  use  of  real  time  aircraft  surveillance. 

This  report  addresses  the  problems  and  approaches 
involved  in  reducing  this  observational  data  obtained  from 
aircraft  surveillance  in  order  to  obtain  a  representative 
shipping  field  for  the  time  frame  of  the  experiment. 

B.  Basic  Problems 

The  raw  data,  of  course,  is  highly  dependent  upon  the 
flight  tracks  and  equipment  performance.  This  implies 
that  if  the  region  of  interest  is  divided  into  rectangles 
of  some  preset  size  (e.g.,  1°  x  1°)  then  the  coverage 
afforded  to  these  squares  is  somewhat  uneven.  Specifically 


on  a  flight  day  a  square  may  have  multiple  or  only  partial 
coverage,  i.e.,  a  plane  may  cover  the  square  more  than  once 
(by  covering  it  on  two  legs  of  the  flight) ,  or  may  only 
cover  a  fraction  of  the  square  (if  it  is  not  directly  under 
the  flight  path) .  Different  flight  days  will  usually 
utilize  different  tracks,  resulting  in  differences  in  the 
area  being  covered.  Various  portions  of  the  overall  survey 
area  have  much  higher  priority  than  others  (viz,  a  line 
containing  the  source  and  receiver  ships  in  an  acoustic 
experiment) .  Also,  different  planes  will  often  be  used. 

The  equipment  on  board  such  planes  can  vary  greatly  in 
quality  and  reliability.  The  planes  may  even  have  differ¬ 
ent  types  of  equipment.  And  different  radar  operators  will 
account  for  more  differences  in  data;  a  good  operator  can 
' tweek'  his  set  with  fine  adjustments  to  a  high  level  of 
quality?  another  operator  might  not  be  able  to  do  any  'fine 
tuning ' . 

Thus,  the  raw  data  gathered  is  not  completely  uniform; 
indeed  in  some  cases  quality  variations  may  be  extreme. 

Also  there  are  historical  shipping  fields  of  various  quality 
which  may  be  utilized.  From  such  an  accumulation  of  data 
one  wishes  to  construct,  using  an  algorithm,  a  representa¬ 
tive  shipping  field. 

C.  Techniques 

There  are  innumerable  possible  ways  of  answering  the 
question  "How  should  one  estimate  density?"  In  light  of 
the  types  of  data  available,  the  problems  mentioned  above, 
and  the  constraints  on  the  analysis  (viz,  a  limited  number 
of  surveillance  flights  and  a  rapid  response  in  terms  of 
the  overall  shipping  field) ,  there  are  two  algorithms  which 


appear  most  suitable.  The  choice  of  which  of  these  algo¬ 
rithms  to  use  in  analyzing  a  specific  set  of  data  depends 
on  the  attributes  of  that  data  set  (e.g.,  the  quality  and 
quantity  of  the  data,  the  conditions  under  which  various 
subsets  of  it  were  taken) .  These  algorithms  are  based  on 
both  theory  and  practice,  and  have  been  utilized  in  the 
analysis  of  surveillance  data  from  exercises. 

The  first  one  is  the  weighted  average  algorithm, 
which  is  described  in  Section  II.  If  the  data  is  very 
limited,  this  algorithm  should  be  used,  which  will  provide 
the  (weighted)  mean  and  variance  of  the  shipping  density 
in  each  square.  This  algorithm  is  quite  rapid  on  a  pro¬ 
grammable  desk  top  electronic  calculator.  If  more  data 
is  available,  or  the  data  is  of  very  uneven  quality,  then 
the  other  algorithm  should  be  utilized.  This  algorithm, 
which  can  utilize  incomplete  data,  is  described  in  Section 
III.  The  theory  underlying  this  algorithm  is  presented  in 
Section  IV. 

Like  any  mathematical  or  statistical  tool,  it  can  be 
misapplied.  If  data  is  uniformly  poor  or  worthless,  no 
algorithm  will  convert  it  into  an  accurate  prediction 
(the  computer  maxim  of  GIGO  is  completely  applicable) . 

If  data  is  very  scanty  (e.g.,  one  day  of  observation),  there 
is  no  point  in  utilizing  such  an  algorithm;  there  is  little 
or  nothing  to  cross-correlate.  In  such  a  situation  the 
former  algorithm  should  be  utilized. 

Typical  results  of  the  weighted  average  algorithm 
are  given  in  Section  II.  An  example  of  the  other  algorithm 
as  applied  to  actual  data  is  given  in  Section  V. 


The  description  of  the  weighted  average  algorithm 
(Section  II  )  is  taken  from  PSI  report  TR-004002,  while 
the  description  of  the  partial  data  set  algorithm  comes 
from  TR-004012. 


II .  Weighted  Average  Algorithm 

It  is  necessary  to  assure  that  the  shipping  field  density 
during  the  course  of  the  exercise  is  stationary,  i.e.,  not 
time  dependent-  It  is  then  of  interest  to  consider  the  average 
density.  A  moments  reflection  will  indicate  that  the  arithmetic 
mean 


1  n 

A  =  ~  Z  d.  (1) 

n  1  x 

has  certain  deficiencies.  For  example,  suppose  on  day  1  an 
area  was  covered  at  three  different  times  by  different  aircraft, 
but  on  day  2  only  one  aircraft  was  in  the  area,  and  it  only 
covered  half  the  square.  The  above  formula  would  attack  equal 
significance  to  the  computed  values  of  the  ship  density  on  those 
two  days,  whereas  the  estimate  of  the  density  on  day  1  is  (sta¬ 
tistically)  superior  to  that  of  day  2.  For  this  reason  a 
weighted  average  will  be  used. 

Before  plunging  into  the  relevant  equations,  several  quanti 
ties  will  be  defined.  Fix  an  area  of  the  ocean  (for  the  com¬ 
putations  5°  x  5°  squares  will  be  used) ,  and  let  n^  be  the 
number  of  ships  observed  on  the  ith  day  (i  -  1,2,. ..,n),  and 
let  p^  be  the  proportion  of  the  area  covered  on  the  ith  day. 

If  p.  ^  0  then  an  estimate  of  the  density  of  ships  .in  the  area 
on  the  ith  day  is  given  by 

d.  =  n./p-  (2) 

x  x'  *1 

For  later  convenience,  define 

n 

s  =  l  p.  (3a) 

i  =  l 


(3b) 


The  weighted  average  has  the  form 


(4) 


where  w^  and  k  are  constants  (the  constant  k  is  introduced  so 
that  the  w^  may  have  simpler  form) .  In  order  to  attach  the  same 
importance  to  each  square  mile  which  was  covered,  the  weight  must 
be  proportional  to  the  amount  of  area  covered,  i.e.,  w.  =  p^. 

The  constant  k  is  chosen  such  that  d  is  an  unbiased  estimator. 
That  is,  if  d^  are  independent  samples  of  a  random  variable 
(called  "density")  having  mean  y ,  then  k  is  chosen  such  that  the 
expected  value  of  d  is  y,  i.e.. 

Eld]  =  y  (5) 

which  implies  k  -  s.  But  w\d^  =  p^d^  —  so 

d  =  (X>i)/S  ‘  (6) 


For  similar  reasons,  a  weighted  estimator  of  variance  is 
desirable.  This  estimator  has  the  form 

cr  =  k  X>.(d.  -  d)  (?) 

k  11 

A 

Letting  w^  =  p^  gives  a  system  consistent  with  equation  (6) .  This 
system  favors  the  larger  area  coverage,  but  does  not  place  inordi- 

A 

nate  emphasis  on  the  days  with  maximum  coverage.  Choosing  k  such 
that  a  is  an  unbiased  estimator  gives 


S2  =  - 


2 

s 


/  T  2 

(  hi  n. 


./p-  d  s) 
x  x 


where  nf/p^  =o  when  •-  o.  The  (weighted)  average  density, 

by  5°  x  5°  square,  is  given  in  Figure  1,  and  the  variance  is 
given  in  Figure  2,  for  data  gathered  recently  in  the  Pacific. 
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III.  Algorithm  for  Estimating  the  Monthly  Shipping  Density 
Averages  from  a  Partial  Data  Set 

A.  Introduction 

The  shipping  density  D(x,  y,  t)  at  a  given  ocean  locality 
is  a  stochastic  function  of  time.  Conceptually,  it  is  defined 
as  a  number  of  ships  per  "unit"  area.  In  practice,  the  unit 
area  is  taken  to  be  a  1/2°  x  1/2°  square  in  the  ocean,  thus 
D (x,  v,  t)  for  any  point  in  the  grid  is  simply  the  number  of 
ships  in  the  grid.  Thus  instead  of  a  continuous  varying  point 
(x,  y)  in  the  ocean,  we  have 

D{x,  y,  t)  *  D(Si,  t)  (1) 

where  (x,  y)  e 

and  S^,  i  =  1,  2 , . . . ,N  is  a  partition 

of  the  given  ocean  area  into  1/2®  x  1/2°  squares.  Our  problem 
is  to  estimate  the  time  averages 

JrT„ 

1  D(S.  ,  t)  dt  (2) 

T0 


In  the  present  problem  we  take  -  TQ  --  one  month 

Now  D(S^,  t)  is  observed  for  only  a  few  points  in  time  t ^ , 

j  =  1 , . . .  , M.  In  fact,  M  <.  4.  Furthermore  the  set  of  observed 
at  a  given  time  t4  may  or  may  not  cover  the  entire  ocean  area.  We 
^ha  1.1  assume,  that  there  is  at  least  one  time-,  t-^,  for  example,  when 
the  observation  (by  radar) ,  covers  the  entire  ocean  area.  The 
observation  of  the  ocean  area  at  a  given  time  tj  may  be 
represented  by  an  N  x  M  matrix  C,  i.e.. 


C (i ,  j)  =  (C(Si,  t.)]  i  =  1,  2,...,N;  j  =  1,...,M  (3) 


where  C  (S .  ,  t. )  -  number  of  tirn.es  square  S.  is  observed. 

i  j  l 


*r~- 


1 

JL- 


Of  course  C(i,  j)  >.0.  And  we  stipulate  that  for  j  =  1, 

C(i,  j)  >.  1.  Corresponding  to  C(i,  j),  we  have  the  ship  matrix 
S ,  i.e. , 


S(i,  j)  =  number  of  ships  reported  to  be  in 
at  time  t. 


Then  the  Density  Matrix  D  is  given  by 


D (i/  j)  =  S  (i ,  j)/  C  (i ,  j)  (4) 

where  only  elements  of  D(i,  j)  for  which  C(i,  j)  ^  0  are  defined. 
Thus  D  is  an  incomplete  matrix,  The  problem  is  then  to  estimate 
the  vector  D, 


D  =  (D (1) , . . . D (N) )  from  the  matrices  C  and  S. 


B.  Solution 

The  following  approach  is  adopted. 


Step  1:  Complete  the  matrix  D  from  the  data  C  and  S, 
by  method  of  least  square. 

M 

[£  w.D(i,  j) ] , 

i=l:i 


II 

vH 

<  Q 

1 

M 

where 

II  '*'■  l» 

(l) 


are  weights. 

C.  Details  of  Step  1 


Recall  that  the  first  column  of  D  is  complete,  and  for 
simplicity,  we  assume  that  all  other  columns  of  D  are  incomplete; 
we  shall  complete  the  other  incomplete  columns  so  as  to  maximize 
the • correlation  of  the  j-th  column  of  D 


b_.  =  (D(i,  j),  i  =  1,  2 ,  .  .  .  ,  N }  with  the  first 

column,  and  with  each  other.  And  this  is  done  by  means  of 
weighted  least  square.  More  precisely,  we  maximize  the  follow¬ 
ing  function  p. 


(2) 


M 


M 


p  =  t  k .  P(D1  ,  D.)  +  Z  k..  p  (D .  ,  D.) 
j=2  3  X  3  i-2  13  1  3 

where  p(D^,  D^)  is  the  proportionality  factor  between  and 

and  k's  are  some  sort  of  weighting  factors.  The  "non-existent" 
D(i,  j)  are  variables  to  be  estimated  by  maximizing  p,  i.e.,  by 
setting 


- 0 


(3) 


One  equation  for  each  missing  D(i,  j)  (=  x)  and  k^  are  some 
weighting  factors.  However  to  simplify  the  calculation,  we 
tentatively  set  k^  =  0.*  Then  p  is  maximized  by  separately 
maximizing 


Pj  =  P  (D]_ /  Dj) 


(4) 


i.e.,  by  setting 


35T=  0 


(5) 


one  equation  for  each  missing  D(i,  j)  (=  x)  in 

We  further  specify  the  form  of  p^  so  that  for  a  given  miss¬ 
ing  D(i,  j)  (=  x) ,  equation 


3p  • 


3x 


—  =  0  is  of  the  following  form 


~  [E  C  (i ,  1)  C  (i ,  j)  (D  (i  ,  1)  x-D  (i ,  j)‘*x.)  ] 
ieE  j 


(6) 


where  E^  =  the  set  of  i  e[l,2,...,N]  for  v;hich  D(i,  j)  exist 

and  x1  =  D(i,  1),  the  element  in  the  first  column  corresponding  to 

J*  tVi 

to  the  missing  element  in  the  j —  column 

x  =  D  (i,  j  )  . 


*From  the  fact  that  many  elements  of  D. ,  D.  are  missing,  we  ex¬ 
pect  k—  <<  .  Hence  as  a  first  approximation,  k^  ..  =  0. 


13 


13 


The  complete  form  of  is 


p.  =  Z  (  Z  C (i ,  1)  C  ( i ,  j)  D ( i , 1)  x  -D (i ,  j)  x.,) 

3  x,y ,  z  |  ieEj 


2\ 


(7) 


From  (6)  we  deduce  immediately* 


2  C(i,  1)  C(i,  j)  D (i /  j) 

_  1£Ei  _  _ 

x,  2  C (i ,  1)  C(i,  j)  D( i ,  1)  D(i,  j)  aj 
1  ieE. 


(8) 


Thus  we  see  that  a^  is  the  same  for  all  x  -  D(i,  j)  in  5^ 
Thus  we  have  simply 

D(i,  j)  =  a..  D(i,  1) 


(9) 


for  all  i  i  Ej.  Thus  the  missing  D(i,  j)  can  be  estimated  for 
each  j  =  and  the  monthly  average  computed  by 


M 

D (i)  -  (2  D  (i ,  j))/M 

j=l 


(10) 


*"This  is  the  benefit  of  assuming  k .  .  =  0,  otherwise  we  have  to 
solve  a  simultaneous  system  of  .linear  equations 
iterative  method. 


or  use  an 


D.  General  Algorithm 


Section  C  treated  the  special  case  where  it  was  assumed 
that  the  first  column  of  the  D  matrix  (which  corresponded  to 
flight  day  'one')  was  complete,  and  that  all  other  columns  were 
incomplete.  That  is,  it  was  assumed  that  one  of  the  flight  days 
obtained  total  coverage  of  the  area  and  all  other  days  only  had 
partial  coverage.  This  is  the  simplest  case,  and  was  used  to 
illustrate  the  technique  involved.  In  practice,  of  course,  the 
situation  is  much  more  complex.  The  number  of  complete  area 
coverage  days  may  be  much  greater  than  one,  or  it  may  be  zero. 

Also,  the  simplified  case  presented  above  assumed  that  all 
days  had  equal  weighting;  this  is  rarely  the  case,  for  reasons 
mentioned  in  section  I-B.  Thus  appropriate  weights  must  be 
computed.  As  was  done  in  section  II,  weights  are  determined  by 
the  number  of  times  each  flight  covered  the  specific  squares 
which  it  was  surveying. 


V.  Theory  of  Estimating  Ship  Density  Time  Averages  for 


Incomplete  Data 

A.  Introduction 

The  previous  section  presented  an  algorithm  for  estimating 
ship  density  time  averages  from  incomplete  data.  It  was  a 
weighted  least  square  method,  minimizing  a  certain  weighted  sum 
of  squares.  It  turns  out  that  the  computation  of  that  algorithm 
is  quite  simple.  In  this  paper  we  tackle  the  same  problem  from 
a  theoretical  point  of  view.  This  naturally  leads  to  a  maximal 
likelihood  estimate  for  the  density.  It  is  shown  that  the  maxi¬ 
mal  likelihood  estimate  will  tend  to  make  each  component  of  the 
cited  sum  of  squares  small,  thus  vindicating  our  algorithm. 

The  exact  numerical  solution  of  the  maximal  likelihood  esti¬ 
mate  is  much  more  complicated  than  the  algorithm  of  Section  III. 
The  algorithmic  solution  is  utilized  as  an  approximation  to  the 
maximal  likelihood  estimate. 

The  notation  of  this  section  is  consistent  with  that  used 
in  the  description  of  the  algorithm.  In  particular  the  meaning 
of  C,  D,  S,  Sj ,  S/  ,  M,  N,...  are  the  same. 

B.  Theory,  Assumptions  and  Notation 

N 

Recall  that  is  a  partition  of  the  ocean  area  of 

i=l 

interest.  Observations  are  made  at  times  ty  j=l,...,N.  The 
ship  matrix  S  looks  schematically  as  follows 


Fig.  3  Schematic 
View  of  S 


where  E.  is  the  set  of  S.’s  for  which  D(i,j)  exists. 


We  shall  assume  that  for  each  square  S^,  there  is  a  proba¬ 
bility  p(S^)  =  such  that  given  a  ship  in  the  area,  it  will 
be  found  to  be  in  the  square  S^.  We  assume  that  p^  is  constant 
in  the  period  from  t^  to  t^  (i.e.,  we  assume  that  the  relative 
densities  throughout  the  area  do  not  change  in  the  time  period 
under  consideration.)  By  looking  at  the  generating  function 

g(xlf...,xN|K)  =  (Pi^i  + .. .+  pNxn)  (1) 

where 

N 


r 


It  is  seen  that  the  probability  of  there  being  ships  in 
i=l , . . . ,N  is 


Pr(k1, ... ,kN)  =  C (K;  kL 


N 

,k  )  XT  (P. ) 
N  i=1  i 


k. 

i 


(3) 


where 

Z  k.  =  K 

l 


and  C  is  the  multinomial  coefficient. 

Let  there  be  a  total  of  K  ships  in  the  area  and  assume  that 
only  a  subset  E  of  the  squares  in  area  S  have  been  observed  that 
day  (either  full  or  partial  coverage  of  the  square) .  For  con¬ 
venience  let  the  subset  E  of 


S  =  (S 


1' 


*  ,SN} 


be  given  by 

E  =  {Si:i  e  Q} 

where  Q  is  a  subset  of  {1,2,. . . ,N) . 


icQ 


Define 


KE  ieQ  ki 


Pc  =  1  ~  PE 


Kc  =  K  ~  ^ 


Then  the  probability  that  there  are  k^  ships  in  square  for  all 
squares  in  the  subset  E  (gives  a  tctsl  of  K  ships)  is 


Pr  [D(Si)  =  kL  V i£Q I K] 

K  K  K  (K  )  1 

=  (^,<pe,  E<pc. 

ip.Q  i' 


•  1C  (Pj/Pg)11! 

ieQ  1  E 


c  w  i 
C  ieQ  1 


K  k  . 

(pc>  °JI  <p± >  1  . 
ieQ  1 


Denote  this  function  by  Pr(Q|K),  which  is  a  much  simpler,  though 
nonrigorous  (and  somewhat  sloppy)  notation. 

C.  Maximal  Likelihood  Estimates 


The  Likelihood  function  L  foi  the  M  time  observations  in 
the  overall  area  S  is  given  by 


M 

L(S)  =  login  Pr  (Q,  K.) 

1  =  1  -> 


where  is  the  subset  of  indicies  corresponding  to  the  square 
observed  on  the  j-th  day,  and  K_.  is  the  total  number  of  ships  in 
area  S  on  the  j-th  day. 

That  is,  if  denotes  the  subset  of  squares  of  the  set 
S  which  were  covered  on  the  j-th  day  of  observation,  then 


Ej  =  {S^  ieQ.}. 


The  unknowns  in  equation  (1)  are  the  total  number  of  ships 
in  the  area  for  each  day  of  coven  ge  (note  this  is  not  equal 
to  the  number  of  ships  observed  on  those  days) ,  and  the  base 
probabilities  for  each  square. 

These  unknowns  are  determined  as  those  values  which  maxi¬ 
mize  L(S) ,  i.e.,  the  solution  of  the  following  system 


—  [L(S)  +  X  (1  -  Z  p  .  )  ]  = 
j  i-1  1 


for 


0 

j  E’ 


3  Pi 


-EL (S)  +  A  (1 


N 

-  £  Pi)] 

i=l  1 
for 


0 

l<i<.N 


>(2) 


where,  of  course,  we  have  the  constraining  relationship 


N 

S 

i=l 


»  1 


The  set  J  consists  of  indicies  of  all  days  which  did  not  cover 
(full  or  partially)  every  square,  i.e. 

J  =  { j  :  1  <.j  and  E ^  ^  S}. 

Or,  more  explicitly,  the  system  of  equations  is: 


[log  K.  !  -  log  k  .!  +  k.  log  (1-q.)]  =  0 

j  J  J  J 

for  jcJ 


e  Z  D(i, 3)  log  P, J 

dpi  i=l  je$  1 


+  1  K-i  l9g  +  E  Ki  log  ^-s-s)] 

3pi  jeJ  3  3  jeJ  3  3 


+  E  »T  l  E  D(ifj)log  p.] 
j£j  3pi  ieQ.  1 


-  A  =  0 


for  l<.i<N 


where  we  define  (using  notation  from  section  III) 


K.  =  D ( j )  =  Z  D(i,j) 
3  i=l 


K.  =  D„(j)  =  Z  D' 
3  E  ieQj 


K.  =  K .  -  K. 

3  3  3 


4>  =  {j:lij£M}  -  J 


qj  =  Pi 
3  ieQ  j 


That  is,  is  the  partial  sum  of  ships,  its  difference  from 
the  total  sum  K ^ ,  $  is  the  complement  of  J  and  q^  is  the  partial 
sum  of  probabilities.  The  quantity  A  is  the  Lagrangian  multi¬ 
plier  arising  from  the  restraining  equation. 

Without  loss  of  generality  we  may  assume  that  the  set  J  has 
exactly  one  element,  and  this  element  will  be  taken  to  be  ’1'. 
For  it  is  assumed  that  on  some  day  all  squares  will  be  covered 
(fully  or  partially);  if  this  is  not  so,  historical  data  can  be 
utilized  to  establish  a  'base'  field,  or  first  overall  estimate. 
(Incorporation  of  historical  data  bases  with  observational  data 


in  these  algorithms  can  be  easily  accomplished.  However  under 
such  conditions  weights  should  definitely  be  used  to  reflect  the 
relative  confidence  in  the  differing  data  types  -  see  section  I) . 
If  J  consists  of  more  than  one  element,  then  those  days  may  be 
combined  (using  appropriate  weighting,  as  is  done  in  section  II) , 
into  a  single  'day'  of  data,  which  is  then  utilized.  In  practice, 
of  course,  the  algorithm  given  in  section  III  performs  properly 
regardless  of  the  size  of  this  artificial  set  'J'.  Therefore, 
there  is  no  loss  of  generality  in  assuming  that  the  set  J  con¬ 
sists  of  the  single  index  '1',  which  is  implicitly  used  in  the 
following  subsections. 

We  shall  solve  the  problem  first  for  the  case  M  =  2 ,  then 
for  the  case  M  =  3.  These  two  specific  cases  will  suffice  for 
illustrative  purposes;  in  practice  the  least  squares  algorithm 
is  always  used,  which  simplifies  computation  immensely. 

D.  The  Case  M  =  2 

To  simplify  notation  we  write  q,  k ,  K  and  E  for  q2 ,  <2,  K2 
and  E2  respectively.  The  requirement  is  to  solve  the 
following  simultaneous  system: 

— ■  [log  K! -log (K-k) i  +  (K-k)  log(l-q)]  =  0 


D(i,  1)/Pi  +  [-|- 


K-K 

1-q 


+ 


D (i , 2) 
Pi 


]  <j>E (i) 


A 


for  lii<N 


N 

l 

i=l 


(4) 


where  <j)  is  the  characteristic  function  of  the  set  E,  i.e. 


Cl  if  i£E 

jjO  if  ifj:E 


The  first  equation  will  be  replaced  by  an  approximation  by  using 
the  Stirling  approximation  for  K! 


K 1  »  /2iK  KK  exp  (-K  +  w  (K) )  , 


(5) 


where 


12K+6  < 


u(Ki 


<  12K 


Hence 


Ki  _  1  tc(2K  -  k ) 
<K-Kir  12  (K- k)  2  K2 


K 


+  lo«  K-ic 


Thus  K  is  a  root  of  the  system: 


1  K 

2  K(K-k) 


1 

12 


K (2K-k) 
K2(K-k)2 


2  k  (k-<)  +  log  k~k  +  lo9d-q)  =  o 


D ( i,  1 )  =  Xpi  for  i^Q 


D(i,l)  +  [(-g-) 

Vi 


-  <£f>  +  ^^iPi  -  ^ 


for  ieQ 


(6) 


The  system  (6)  can  be  solved  iteratively  as  follows. 

Adding  the  last  equations  in  the  system  for  i=l,.„ . ,N  we  get 

D(l)  +  De(2)  +  k  -  •  q  =  X  (7) 

As  an  initial  approximation  to  q,  we  take 

(q)  =  De(1)/D(1)  (8) 

Substituting  this  value  of  q  into  the  first  equation  of  (6)  we 
solve  for  K  and  denote  the  estimate  by  (K) .  Substituting  this 
value  of  K  into  eq.  (7) ,  we  get  (X),  hence  we  get 


Of  course,  as  with  most  iterative  procedures  designed  to 
solve  maximum  likelihood  estimation  problems,  it  is  possible  for 
the  data  to  be  such  that  the  estimates  either  diverge,  oscillate, 
or  converge  to  a  false  position  which  is  not  the  maximum. 

Since  we  are  only  interested  in  the  theoretical  properties 
of  the  problem,  and  do  not  intend  to  use  this  procedure  for 
actual  computation  in  a  production  environment,  it  is  of  no  value 
to  digress  into  techniques  of  numerical  analysis  to  handle  such 
cases,  or  to  examine  the  behavior  of  the  system  when  the  maximum 
lies  on  the  boundary  of  the  domain,  rather  than  the  interior  case 
(which  is  the  general  case) .  It  suffices  to  say  that  the  M.L.E. 
function  is  continuous  on  a  compact  subset  of  Euclidean  space, 
therefore  a  maximum  exists. 

E .  The  Relation  of  Maximal  Likelihood  Estimates  to  the 
Least  Squar e  Method 

Recall  that  the  least  square  estimates  D(.i.,2)  ,  i|Q 


are  obtained  by  minimizing  each  of  the  following  expressions 


p  =  2  C(i,l)  *C(i,2)  [DU,l)D(i,2) 

ieQ 


D(£/2)D(i,l) ]2 
for  £,|q 


(1) 


where  the  C's  and  D's  are  elements  of  the  matrices  used  in 
section  .’ill. 


To  digress  a  minute,  consider  the  random  variable 

K 

Z  =  I  L 
3=1  D 


(2) 


where  the  £'s  are  independent  Bernoulli  random  variables  with 
parameter  (expectation)  p^  and  K  is  an  integer  valued  random 
variable.  Then  this  is  a  stochastic  representation  of  the  number 
of  ships  in  the  Jl-th  square  when  the  variable  K  denotes  the 
total  number  of  ships  in  the  area.  Now  the  minimizing  value  of 
D ( £ , 2 )  for  (see  equation  1  above)  is  the  estimator  of  the 

expected  value  of  Z ,  which  is 


E  [Zj  =  p^E  [K]  (3) 

[In  the  situation  where  there  is  partial  coverage  of 
squares  (or  multiple  coverage  of  part  of  a  square  by  different 

flights) ,  then  the  above  heuristic  model  must  be  expanded,  since 
now  the  estimator  K  need  not  be  an  integer.  In  such  a  case  the 

modification  is 


k 

z  =  E  £.  +  n 

3=1  3 

where  k  is  the  integral  portion  of  K  and  n  is  a  discrete  random 
variable  related  to  the  fractional  portion  of  the  random  vari¬ 
able  K.  The  expected  value  relation  (3)  still  holds  however.] 


Now  the  maximal  likelihood  estimates  of  K  and  the  p  's 
are  such  as  to  make  Kp^  approximate  D(£,2) 

and  D { 1) p^  approximate  D(£,l),  i.e., 

D (i , 2)  Kp^ 

for  ieQ 

D  (i  ,  1 )  **  D(l)pi 
D(£,l)  **  D(i)pa  for 

D(1.2)=KPl  (4) 

Thus  system  (4)  will  make  the  term 

[D(£,l)D(i,2)  -  D(£,2)D(i,l) ]  2 

small  for  all  ieQ  and  £|q,  and  hence  it  will  make  p  small. 

This  shows  that  the  least  square  estimates  of  D(£,2) ,  £^Q 
must  approximate  the  maximal  likelihood  estimates.  However  rhe 
numerical  calculation  of  D(£,2)  by  weighted  least  squares  is 
much  shorter  than  that  of  maximal  likelihood  estimates. 

F.  The  Case  N  =  3 

We  shall  illustrate  the  calculation  of  maximal  likelihood, 
estimates  for  . . . ,KM,p^, . . . pN  for  the  case  M  =  3.  The  calcula¬ 
tions  are  much  more  complicated  than  the  case  M  =  2.  Thus  we 
appreciate  even  more  the  quick  method  of  least  squares. 

Corresponding  to  eq.  (4)  of  subsection  D,  we  have 


K  ■  ( 2K . -K  .  ) 

_ 3-  3  3 

12K2 (K.-k . ) 2 
3  3  3 


K.  K  . 

2K  .  (K  .  -K  .')  -1  log  k7^k~  +  log(1-!3j)  =  0 
3  3  3  3  3  J 


j  = 


2,3 


(i) 


is  achieved. 


V.  Analysis  of  a  Specific  Case 

This  partial  data  set  algorithm  was  utilized  in  the 
analysis  of  data  obtained  in  the  Alboran  basin.  The  flight 
tracks  varied  widely,  resulting  in  partial  coverage  for 
almost  all  flight  days;  in  some  instances,  less  than  a 
quarter  of  the  squares  were  covered  (although  this  was 
very  unusual) . 

There  were  three  types  of  flights  that  gathered  data 
on  the  Alboran  during  the  exercise: 

(a)  Aircraft  Surveillance  flights  performed 
during  August. 

(b)  Aircraft  Surveillance  flights  performed 
during  November. 

(c)  Other  flights  which  performed  shipping 
surveillance.  These  flights  had 

as  their  primary  tasks  other  duties 
which  allowed  shipping  data  to  be 
gathered  at  the  same  time.  These 
covered  a  period  of  several  months. 

There  were  a  few  flights  in  groups  (a)  and  (b)  and  several 
in  group  (c) .  Quality  of  the  data  varied  widely;  most 
flights  in  groups  (a)  and  (b)  obtained  excellent  data, 
while  several  flights  in  group  (c)  had  radar  difficulties. 
The  time  period  of  interest  was  November;  thus,  August 
shipping  densities  were  scaled  to  reflect  the  lighter 
traffic  in  that  region  during  November.  Also  they  were 
weighted  much  lighter  than  the  November  flights.  The 
representative  density  obtained  for  November  is  shown  in 
figure  3.  The  confidence  in  the  results  depends  upon  the 
underlying  data.  In  the  Alboran  Sea  itself,  much  data 
was  available  and  the  results  are  felt  to  be  very  trust¬ 
worthy.  In  the  extreme  eastern  region  of  the  area  (squares 
12-14,  22-24,  33-35,  46-47)  little  data  was  available 


(only  two  flights  gathered  data  on  these  squares) ;  thus, 
historical  data  for  these  squares  was  entered  as  well  as 
the  observed  data.  Confidence  for  these  end  squares 
naturally  is  much  lower  than  for  the  important  areas. 

A  fishing  fleet  was  observed  on  one  flight  of  group 
(c) ,  which  naturally  produced  a  very  high  density  for  that 
square  on  one  day.  Since  the  algorithm  maximized  overall 
correlation,  the  resultant  shipping  field  did  not  show 
this  abnormality  in  the  square. 

It  is  interesting  to  note  that  densities  were  com¬ 
puted  both  with  and  without  the  final  data  set  (taken  2 
December),  and  they  agreed  within  3%,  indicating  that  the 
additional  data  was  (statistically)  very  similar  to  the 
other  data  sets. 

This  algorithm  has  provided  realistic  shipping  den¬ 
sity  estimates  based  on  actual  exercise  data  of  vastly 
varying  quality.  It  has  shown  relative  insensitivity  to 
singularly  high  data  points  which  are  inconsistent  with 
the  remainder  of  the  data.  Therefore,  we  feel  that  this 
algorithm  will  give  an  accurate  and  robust  estimate  of 
the  shipping  field  under  typical  conditions  encountered 
in  actual  exercises. 


is  bssi 

tocjiisksd  cp  bw 


VI .  Theoretical  Sensitivity  Analysis 

In  this  section  we  consider  the  sensitivity  of  the  weighted 
average  algorithm.  To  do  this  a  theoretical  stochastic  model 
will  be  utilized,  and  the  statistics  of  the  estimator  (i.e.,  the 
computed  density)  will  be  examined. 

If  an  area  of  open  ocean  is  small  (in  comparison  to  the 
world) ,  then  the  number  of  ships  in  that  area  at  a  fixed  time 
can  be  modelled  by  a  Poisson  distribution  whose  parameter  X  is 
the  expected  number  of  ships  in  the  area.  As  before,  this  small 
area  is  taken  to  be  a  square  of  suitable  dimensions.  The  air¬ 
craft  surveillance  days  are  sufficiently  separated  that  a  ship 
in  the  square  on  one  day  will  have  left  the  square  by  the  next 
surveillance  day,  thus  we  may  assume  that  the  ship  counts  or. 
different  days  are  independent  (the  basic  premise  of  statistical 
sampling) . 

Therefore  consider  a  square  which  contains  (on  the  average) 

X  ships,  and  let  these  be  n  surveillance  flights.  Let  p^  be  the 
coverage  of  the  i-th  flight.  Let  be  a  sequence  of  independent 
random  variables  having  Poisson  distributions  with  parameters 
Xp^  respectively.  Then  the  weighted  average  (section  II,  equation 
6)  is  represented  by  the  random  variable 


D  =  SXi/s 


(1) 


where 


s  =  Zpi 


If  Z  is  a  random  variable  having  a  Poisson  distribution 
with  parameter  Xs,  then  D  is  a  law  equivalent  to  Z/s,  i.e.,  they 
have  the  same  distribution 


D 


Z/s 


(2) 


Since  we  are  only  interested  in  the  distribution,  the  cumbersome 
term  'law  equivalent'  will  be  replaced  by  the  common  'equals’. 


The  random  variable  i  is  Poisson,  hence 


E(D)  =  X  (3) 

Var(D)  =  X/s 

so  that  as  s  tends  to  infinity,  D  tends  to  the  constant  X  almost, 
surely, 

Lim  D  =  X  (4) 

s->-» 

Consider  the  random  variable 


Y  =  S~b~  (D  -  X) 

where  characteristic  function  is 

4>y(t)  =  E(elYt)=exp{Xs(elt/v/®-l)-Xit/£} 

Now 

Lim  ( t )  =  exp  ~  Xt2/2 

S->-oo 

uniformly  (in  t)  on  compact  sets,  hence  the  random  variable  Y 
converges  in  law  to  a  normally  distributed  random  variable  with 
mean  0  and  variance  A. 

Therefore  for  large  values  of  s,  the  distribution  of  D  may 
be  approximated  by  a  normal  distribution  with  mean  X  and  vari¬ 
ance  X/s  (compare  with  equation  3) . 

To  evaluate  the  algorithm,  a  natural  question  to  ask  is: 
Given  that  the  density  is  X,  where  do  the  .025  and  .975  points 
on  the  distribution  of  D  lie?  That  is,  find  the  largest  dT  and 
the  smallest  dy  such  that 

Pr (D  <  dT )  <  .025 

li 


Pr (D  >  d„)  >  .975 


(5) 


Then  we  have 


Pr(D  e  [dL,  dyj)  i  .95 

i.e.,  95%  of  the  time  the  algorithm  would  give  a  value  in  this 
interval  For  large  values  of  s  we  may  use  the  normal  approxima¬ 
tion  to  obtain  the  value 


dL  <=  X  -  1.96/X7s 
dy  =  X  -  1.96/XTs 

For  small  values  of  s  these  limits  should  be  computed  directly 
from  the  Poisson  distribution.  If  we  only  consider  the  relative 
bounds  (i.e.,  by  dividing  by  the  mean),  then  we  can  express  the 
equations  in  terms  of  the  single  parameter  &  where 

3  —  Xs 


viz. , 


<$L  =  dL/X  *  1  -  1.96//T 
Sy  =  dy/X  =  1  +  1.96 //T 


(6) 


Figure  4  graphs  these  two  normally  approximated  bounds,  as  well 
as  giving  the  actual  bounds  from  the  Poisson  distributed  variable 
Z.  Note  that  the  Poisson  (relative)  bound,  being  an  integral 
multiple  of  1/3,  does  not  converge  monoton ically  as  the  normal 
bound  does,  but  is  discontinuous. 

Now  consider  a  problem  which  is  quite  different  yet  often 
confused  with  the  previous  one.  Specifically,  given  an  estimate 

/v 

X  of  the  density  in  a  square  (with  a  fixed  coverage  s)  ,  construct 
a  confidence  interval  for  the  actual  density  X.  That  is,  we 
wish  to  find  the  greatest  lower  bound  XT  such  that  if  this  (or 
any  smaller  value)  is  the  true  density,  then  the  probability  that 

A 

the  estimated  density  is  X  or  greater  is  less  than  a  prespecified 
amount  (e.g.,  .025).  A  similar  statement  holds  for  an  upper 


bound  X„.  This  can  be  expressed  as  the  largest  XT  and  the  smallest 

U  L 

Xy  such  that 


Pr (D  >  X  j  X  <  Xh)  <  .025 

(7) 

Pr (D  <  X  |  X  >  Xy)  <  .025 

A 

Note  that  XL  and  Xy  are  functions  of  both  X  and  s.  Following  the 
previous  analysis  we  introduce  new  parameters,  viz.. 


0 


L 


XLS 


3y  -  XyS 

0  =  Xs  (8) 


Once  again  it  turns  out  that  there  is  only  one  essential  parameter, 

A 

for  8t  and  8„  may  be  expressed  solely  in  terms  of  B.  If  we  recall 
jj  u 

the  curves  of  Figure  4,  where  the  abscissa  was  the  parameter  3f 

A 

then  the  values  of  0^  and  By  {for  a  given  B)  may  be  conceived 
graphically  as  the  intersections  with  a  rectangular  hyperbola  of 

A 

parameter  B»  as  illustrated  in  Figure  5. 


Analytically,  of  course,  gL  is  the  solution  of  the  equation 


b/bl  =  5y(3L)  *  bl  e  (o,  «) 

and  gy  satisfies  the  equation 

3/By  =  ^(By) ,  By  6  (0,  «) 

Solving  these  equations  in  the  asymptotic  case  (i.e.,  using  the 
normal  distribution  rather  than  the  Poisson)  yields 

gL  =  {/g  +  .96QT  -  . 98 } 2 

gu  =  { /g'  +  .9604  +  .  98 }2  (9) 

where,  of  course,  all  square  roots  are  positive.  Graphs  of  these 

functions  are  presented  in  Figure  6.  Note  that  gT  and  g„  are 

xj  U 

the  two  sections  of  the  parabola 


(g  -  g)2  =  3 . 8416g 

From  these  graphs  one  may  immediately  obtain  a  confidence  interval 
for  the  true  density  X  using  the  equations  of  system  (8) . 
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2.  The  LRAPP  documents  listed  in  enclosure  (1)  have  been  downgraded  to 
UNCLASSIFIED  and  have  been  approved  for  public  release.  These  documents  should 
be  remarked  as  follows: 

Classification  changed  to  UNCLASSIFIED  by  authority  of  the  Chief  of  Naval 
Operations  (N772)  letter  N772A/6U875630,  20  January  2006. 

DISTRIBUTION  STATEMENT  A:  Approved  for  Public  Release;  Distribution  is 
unlimited. 

3.  Questions  may  be  directed  to  the  undersigned  on  (703)  696-4619,  DSN  426-4619. 


BRIAN  LINK 
By  direction 


Subj :  DECLASSIFICATION  OF  LONG  RANGE  ACOUSTIC  PROPAGATION  PROJECT 
(LRAPP)  DOCUMENTS 

DISTRIBUTION  LIST: 

NAVOCEANO  (Code  N121LC  -  Jaime  Ratliff) 

NRL  Washington  (Code  5596.3  -  Mary  Templeman) 

PEO  LMW  Det  San  Diego  (PMS  181) 

DTIC-OCQ  (Larry  Downing) 

ARL,  U  of  Texas 

Blue  Sea  Corporation  (Dr.Roy  Gaul) 

ONR  32B  (CAPT  Paul  Stewart) 

ONR  3210A  (Dr.  Ellen  Livingston) 

APL,  U  of  Washington 

APL,  Johns  Hopkins  University 

ARL,  Penn  State  University 

MPL  of  Scripps  Institution  of  Oceanography 

WHOI 

NAVSEA 

NAVAIR 

NUWC 

SAIC 


Declassified  LRAPP  Documents 


